GE: description of the NLTooLSET system as used for MUC-3
نویسندگان
چکیده
The GE NLTooLsET aims at extracting and deriving useful information from text using a knowledge-based , domain-independent core of text processing tools, and customizing the existing programs to each new task . The program achieves this transportability by using a core knowledge base and lexicon that adapts easil y to new applications, along with a flexible text processing strategy that is tolerant of gaps in the program' s knowledge base . The NLTooLSET's design provides each system component with access to a rich hand-coded knowledg e base, but each component applies the knowledge selectively, avoiding the computation that a complete analysis of each text would require . The architecture of the system allows for levels of language analysis , from rough skimming to in-depth conceptual interpretation . The NLTooLSET, in its first version, was behind GE 's participation in the MUCK-II conference . Since MUCK-II, the Toolset, now in Release 2 .1, has expanded to include a number of new capabilities, including a text pre-processor for easier customization and better performance, broader lexical and syntactic coverage , and a domain-independent module for applying word-sense preferences in text . In addition to being teste d in several new application areas, the Toolset has achieved about a 10 times speedup in words per minute s over MUCK-II, and can now partially interpret and tag word senses in arbitrary news stories, although it i s very difficult to evaluate this task-independent performance . These basic enhancements preceded the other additions, including a discourse processing module ;, which were made for MUC-3 . The performance of the program on tasks such as MUCK-II and MUC-3 derives mainly from two design characteristics : central knowledge hierarchies and flexible control strategies . A custom-built 10,000 word-root lexicon and 1000-concept hierarchy provides a rich source of lexical information . Entries are separated b y their senses, and contain special context clues to help in the sense-disambiguation process . A morphologica l analyzer contains semantics for about. 75 affixes, and can automatically derive the meanings of inflecte d entries not separately represented in the lexicon . Domain-specific words and phrases are added to th e lexicon by connecting them to higher-level concepts and categories present in the system 's core lexicon and concept hierarchy. Lexical analysis can also be restricted or biased according to the features of a domain . This is one aspect of the NLTooLSET that makes it highly portable from one domain to another . The language analysis strategy in the NLTooLSET uses fairly detailed, chart-style syntactic parsin g guided by conceptual expectations . Domain-driven conceptual structures provide feedback in parsing, contribute to scoring alternative interpretations, help recovery from failed parses, and tie together information across sentence boundaries . The interaction between linguistic and conceptual knowledge sources at the leve l of linguistic relations, called "relation-driven control" was a key system enhancement before MUC-3 . In addition to flexible control, the design of the NLTooLSET allows each knowledge source to influenc e different stages of processing . For example, discourse processing starts before parsing, although many decisions about template merging and splitting are made after parsing . This allows context to guide languag e analysis, while language analysis still determines context .
منابع مشابه
GE-CMU: description of the TIPSTER/SHOGUN system as used for MUC-4
The GE-CMU team is developing the TIPSTER/SHOGUN system under the governmentsponsored TIPSTER program, which aims to advance coverage, accuracy, and portability in tex t interpretation . The system will soon be tested on Japanese and English news stories in tw o new domains . MUC-4 served as the first substantial test of the combined system. Because th e SHOGUN system takes advantage of most of...
متن کاملGE NLToolset: description of the system as used for MUC-4
The GE NLTooLsET is a set of text interpretation tools designed to be easily adapted to ne w domains . This report summarizes the system and its performance on the MUG-4 task .
متن کاملGE NLTooLSET: MUC-3 test results and analysis
This paper reports on the GE NLTooLsET customization effort for MUC-3, and analyzes th e results of the TST2 run . Although our own tests had shown steady improvement between TST1 and TST2, our official scores on TST2 were lower than on TST1 . The analysis of this unexpected result explains some of the details of the i1'UC-3 test, and we propose ways of looking at the score s . to distinguish d...
متن کاملGe-cmu : Description of the Tipster/shogun System as Used for Muc-4 1
The GE-CMU team is developing the TIPSTER/SHOGUN system under the governmentsponsored TIPSTER program, which aims to advance coverage, accuracy, and portability in tex t interpretation . The system will soon be tested on Japanese and English news stories in tw o new domains . MUC-4 served as the first substantial test of the combined system. Because th e SHOGUN system takes advantage of most of...
متن کاملDescription of Lockheed Martin's NLToolset as Applied to MUC-7 (AATM7)
The NLToolset has been used to build a variety of information extraction applications, ranging from military message traffic to newswire accounts of corporate activity. AATM7 is an acronym for As Applied To MUC-7. AATM7 was not tailored specifically for MUC-7, but rather represents the NLToolset in a state of flux, as TIPSTER experimentation and the delivery of a real-world application were tak...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1991